Evaluation of the J2000 groundwater component

Cluster analysis of ADES piezometers

Author

R. C. Kubina

Published

16-04-2024

1 Cluster analysis

A cluster analysis for the regimes of all piezometers in the Loire catchment was conducted. The data was (so far) not checked. The monthly mean was calculated, then a z-score normalisation was executed and the longtime monthly mean calculated. Stations which inherited NAs afterwards were excluded for clustering.

1.1 Hierarchical clustering

Figure 1: Cluster dendogram for hierarchical clustering
Figure 2: Regimes for 4 clusters - Hierarchical clustering
Figure 3: Map - Hierarchical clustering

1.2 K-means clustering

Figure 4: K-means clustering results for 4 clusters
Figure 5: Regimes for 4 clusters - K-means clustering
Figure 6: Map - K-means clustering

2 Cluster analysis - filtered data

In the ADES dataset there are only NAs for the variable rel_depth. It seems that the variable height is interpolated. For now the stations were filtered for years with more than 90% available data of rel_depth, a start time of the year 2000 and at least 18 years length. Still the parameter height is used for normalization and clustering, with the assumption that the interpolation is proper.

Figure 7: Data availability before filtering
Figure 8: Data availability after filtering

2.1 K-means clustering

A total of 6 clusters were chosen to see if its possible to distinguish better between influenced and not influenced piezometers.

Figure 9: K-means clustering results for 6 clusters
Figure 10: Regimes for 6 clusters - K-means clustering

2.2 Random Forest clustering

A Random Forest (RF) model was trained unsupervised with 5000 trees. The clustering was done using Partitioning Around Medoids (PAM) clustering for the proximity matrix of the RF output.

rf <- randomForest(x = df_m_wide2, ntree = 5000, proximity = TRUE)
prox <- rf$proximity
pam.rf <- pam(prox, 6)
Figure 11: Regimes for 6 clusters - Random Forest clustering